-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for unmarshalling error for keepequal/dropequal #2794
Fix for unmarshalling error for keepequal/dropequal #2794
Conversation
cmd/otel-allocator/server/server.go
Outdated
|
||
var jobToScrapeConfig map[string]interface{} | ||
err = json.Unmarshal(jsonConfig, &jobToScrapeConfig) | ||
if jobToScrapeConfig != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this scary looking surgery, we should probably do something similar to what prometheus-operator does, and manually build the scrapeconfig out of appropriate yaml structs: https://github.com/prometheus-operator/prometheus-operator/blob/4e3b2bcea44cbfdbd5c4975d777240285721dc6b/pkg/prometheus/promcfg.go#L1645/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @swiatekm-sumo. But that would mean that whenever there is an update to the scrape config, we would need to manually update this as well which seems tedious. Also, I dont see any libraries I can reuse. Until it is fixed in prometheus, could this be considered as a fix since the marshal implementation simplifies the conversion and we can rely on the interface implementation to reliably convert to yaml? Once that is done, we can just remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swiatekm-sumo - Following up. Please let me know your thoughts. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that your approach fixes this specific problem, but is still leaves us uncertain if we don't have other similar problems that we have yet to encounter. I'm not against merging this specific fix as-is, but I'd like to look for a more systemic solution, where we can be assured our serialized scrape configs are correct. @jaronoff97 @pavolloffay wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @swiatekm-sumo. As a part of this fix, I also did test out all the relabel actions available with prometheus configuration. These 2 were the only actions that cause problems and since this is unique to a bug, should not likely reoccur. If there will be other issues related to marshalling, it would also affect all prometheus users dependent on it and the ideal fix should be from the prometheus implementation of marshalling. We should also be able to reliably take dependency on the native prometheus marshalling/unmarshalling implementation without duplication of effort. Please let me know what you think. Thanks again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to write unit tests checking this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure @swiatekm-sumo. I have added tests checking that unmarshaling doesn't result in errors for any prometheus action supported. Please take a look. Thank you!
…rashekar/opentelemetry-operator into rashmi/keepequal-fix
…rashekar/opentelemetry-operator into rashmi/keepequal-fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm ok with this fix, although I'd like to open an issue to find a better solution as well. But I'd like @pavolloffay and @jaronoff97 to give their opinions as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the linter error, otherwise this looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this, and being patient with my feedback. As I said earlier, I consider this a temporary fix until we can figure out a more robust way of dealing with the problem. However, fixing the bug is more important than other considerations here in my view.
Could you add a comment to the function signature explaining why we need to do this, linking this issue prometheus/prometheus#12534 |
Thanks @swiatekm-sumo. Your feedback makes sense to me. |
Thanks @jaronoff97. Added the comment. Please take a look. |
Co-authored-by: Jacob Aronoff <[email protected]>
@jaronoff97 - I have committed your suggestion. Please merge if all looks good, Thanks again! |
@rashmichandrashekar before I merge, did you confirm this worked by running the TA locally and the scrape_configs response worked with the collector? If not, it may be helpful to add an e2e here... |
@jaronoff97 - Yes, I did test out that this works by running TA and collector locally :) |
) * fix * add require * fixing lint errors * adding unit tests * fixing some yaml * adding few more tests * fixing lint * adding comment * Update cmd/otel-allocator/server/server.go Co-authored-by: Jacob Aronoff <[email protected]> --------- Co-authored-by: Jacob Aronoff <[email protected]>
) * fix * add require * fixing lint errors * adding unit tests * fixing some yaml * adding few more tests * fixing lint * adding comment * Update cmd/otel-allocator/server/server.go Co-authored-by: Jacob Aronoff <[email protected]> --------- Co-authored-by: Jacob Aronoff <[email protected]>
Description:
Fixing a bug - Fix for targetAllocator returned scrape config crashing otel MetricsReceiver when relabel or metric relabel action has action:keepequal or dropequal
Link to tracking Issue(s):
2793
Testing:
Tested that it doesn't crash metricsreceiver